Fault Tolerance Against Design Faults

نویسنده

  • Lorenzo Strigini
چکیده

This chapter surveys techniques for tolerating the effects of design defects in computer systems, paying special attention to software. Design faults are a major cause of failure in modern computer systems, and their relative importance is growing as techniques for tolerating physical faults gain wider acceptance. Although design faults could in principle be eliminated, in practice they are inevitable in many categories of systems, and designers need to apply fault tolerance for mitigating their effects. Limited degrees of fault tolerance in software – “defensive programming” – are common, but systematic application of fault tolerance for design faults is still rare and mostly limited to highly critical systems. However, the increasing dependence of system designers on off-the-shelf components often makes fault tolerance a necessary, feasible and probably cost-effective solution for achieving modest dependability improvements at affordable cost. This chapter introduces techniques and principles, outlines similarities and differences with fault tolerance against physical faults, provides a structured description of the space of design solutions, and discusses some design issues and trade-offs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Novel Defect Terminolgy Beside Evaluation And Design Fault Tolerant Logic Gates In Quantum-Dot Cellular Automata

Quantum dot Cellular Automata (QCA) is one of the important nano-level technologies for implementation of both combinational and sequential systems. QCA have the potential to achieve low power dissipation and operate high speed at THZ frequencies. However large probability of occurrence fabrication defects in QCA, is a fundamental challenge to use this emerging technology. Because of these vari...

متن کامل

On Fault Tolerance Methods for Networks-on-Chip On Fault Tolerance Methods for Networks-on-Chip

Technology scaling has proceeded into dimensions in which the reliability of manufactured devices is becoming endangered. The reliability decrease is a consequence of physical limitations, relative increase of variations, and decreasing noise margins, among others. A promising solution for bringing the reliability of circuits back to a desired level is the use of design methods which introduce ...

متن کامل

A generalized ABFT technique using a fault tolerant neural network

In this paper we first show that standard BP algorithm cannot yeild to a uniform information distribution over the neural network architecture. A measure of sensitivity is defined to evaluate fault tolerance of neural network and then we show that the sensitivity of a link is closely related to the amount of information passes through it. Based on this assumption, we prove that the distribu...

متن کامل

Application-layer Fault-Tolerance Protocols

The central topic of this book is application-level fault-tolerance, that is the methods, architectures, and tools that allow to express a fault-tolerant system in the application software of our computers. Application-level fault-tolerance is a sub-class of software fault-tolerance that focuses on the problems of expressing the problems and solutions of fault-tolerance in the top layer of the ...

متن کامل

Randomized algorithms for reliable broadcast

In this thesis, we design randomized algorithms for classical problems in faulttolerant distributed computing in the full-information model. The full-information model is a strong adversarial model which imposes no restrictions on the computational power of the faulty players nor on the information available to them. Namely, the faulty players are infinitely powerful and are privy to all the co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005